21 research outputs found
A Maximum-Entropy Partial Parser for Unrestricted Text
This paper describes a partial parser that assigns syntactic structures to
sequences of part-of-speech tags. The program uses the maximum entropy
parameter estimation method, which allows a flexible combination of different
knowledge sources: the hierarchical structure, parts of speech and phrasal
categories. In effect, the parser goes beyond simple bracketing and recognises
even fairly complex structures. We give accuracy figures for different
applications of the parser.Comment: 9 pages, LaTe
Chunk Tagger - Statistical Recognition of Noun Phrases
We describe a stochastic approach to partial parsing, i.e., the recognition
of syntactic structures of limited depth. The technique utilises Markov Models,
but goes beyond usual bracketing approaches, since it is capable of recognising
not only the boundaries, but also the internal structure and syntactic category
of simple as well as complex NP's, PP's, AP's and adverbials. We compare
tagging accuracy for different applications and encoding schemes.Comment: 7 pages, LaTe
Preference-Driven Bimachine Compilation : An Application to TTS Text Normalisation
This paper describes a grammar formalism and a deterministic parser developed for text normalisation
in the rVoice1 text-to-speech (TTS) system. The rules are formulated using regular
expressions and converted into a non-deterministic finite-state transducer (FST). At runtime,
search is guided by parsing preferences which the user may associate with regular operators;
the best solution is determined in a way similar to the directional evaluation of constraints in
Optimality Theory. During compilation, the FST is converted into a bimachine, making deterministic
parsing possible
Incremental Construction of Minimal Sequential Transducers The Unsorted-Data Algorithm for Acyclic Sequential Transducers
This paper presents an efficient algorithm for the incremental construction of a minimal acyclic sequential transducer (ST) from a list of input and output strings. The algorithm generalizes a known method of constructing minimal finite-state automata (Daciuk, Mihov, Watson and Watson 2000). Unlike the algorithm published by Mihov and Maurel (2001), it does not require the input strings to be sorted in advance. The algorithm is illustrated by an application in a text-to-speech system.